NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation

Gong, Albert; Stankeviciute, Kamile; Wan, Chao; Kabra, Anmol; Thesmar, Raphael; Lee, Johann; Klenke, Julius; Gomes, Carla P; Weinberger, Kilian Q (July 2025, Proceedings of Machine Learning Research)

High-quality benchmarks are essential for evaluating reasoning and retrieval capabilities of large language models (LLMs). However, curating datasets for this purpose is not a permanent solution as they are prone to data leakage and inflated performance results. To address these challenges, we propose PhantomWiki: a pipeline to generate unique and factually consistent document corpora with diverse question-answer pairs. Unlike prior work, PhantomWiki is neither a fixed dataset, nor is it based on any existing data. Instead, a new PhantomWiki instance is generated on demand for each evaluation. We vary the question difficulty and corpus size to disentangle reasoning and retrieval capabilities respectively, and find that PhantomWiki datasets are surprisingly challenging for frontier LLMs. Thus, we contribute a scalable and data leakage-resistant framework for disentangled evaluation of reasoning, retrieval, and tool-use abilities.
more » « less
Free, publicly-accessible full text available July 16, 2026
Exponential Family Model-Based Reinforcement Learning via Score Matching

Li, Gene; Li, Junbo; Kabra, Anmol; Srebro, Nati; Wang, Zhaoran; Yang, Zhuoran (January 2022, Advances in neural information processing systems)

Full Text Available
Characterizing the Loss Landscape in Non-Negative Matrix Factorization

https://doi.org/10.1609/aaai.v35i8.16836

Bjorck, Johan; Kabra, Anmol; Weinberger, Kilian Q.; Gomes, Carla (May 2021, Proceedings of the AAAI Conference on Artificial Intelligence)

Non-negative matrix factorization (NMF) is a highly celebrated algorithm for matrix decomposition that guarantees non-negative factors. The underlying optimization problem is computationally intractable, yet in practice, gradient-descent-based methods often find good solutions. In this paper, we revisit the NMF optimization problem and analyze its loss landscape in non-worst-case settings. It has recently been observed that gradients in deep networks tend to point towards the final minimizer throughout the optimization procedure. We show that a similar property holds (with high probability) for NMF, provably in a non-worst case model with a planted solution, and empirically across an extensive suite of real-world NMF problems. Our analysis predicts that this property becomes more likely with growing number of parameters, and experiments suggest that a similar trend might also hold for deep neural networks---turning increasing dataset sizes and model sizes into a blessing from an optimization perspective.
more » « less
Full Text Available
Characterizing the Loss Landscape in Non-Negative Matrix Factorization

Bjorck, Johan; Kabra, Anmol; Weinberger, Kilian Q.; Gomes, Carla (May 2021, Proceedings of the AAAI Conference on Artificial Intelligence)

Full Text Available

Search for: All records